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CHAPTER 4. ASYMPTOTICS OF POSTERIORS AND MODEL SELECTION 

4.1 Consistency of posteriors. Given a measurable family {Pg, 6' G ©}, dominated 
by a cr-finite measure v, for a measurable space (0,T), a prior tt on 0, and observa- 
tions Xi, X2, ■ ■ ■ i.i.d. (-Peo) for some 6*0 € O, we have posteriors TTx,n on G where 
X = (Xi, . . . , Xn) for each n. Recall that we have defined the posteriors by multiplying the 
prior by the likelihood function H^^ifiO, Xj) and normalizing the result, if possible, to be 
a probability measure (Proposition 1.3.5). (For Theorem 4.1.4 below, where {Pq, ^ e 6} 
is not necessarily dominated, a more general definition of posteriors will be used.) The 
posteriors will be called consistent if for every neighborhood U of Oq, Tix,n{U) — > 1 almost 
surely as n — 00. This form of consistency is not for estimators but is just a property 
of the prior and the likelihood function. 

In some situations, consistency of posteriors can lead to consistency of estimators. 
For example, if is an interval in M, and T„ is a median of the posterior law TTx,ni then 
consistency of posteriors will imply that T„ are consistent. If the interval is bounded, 
could also be taken as the mean of tt^; 

If the prior tt has 7r(C/) = for some neighborhood U of the true parameter 6*0, then 
'T^x,n{U) = for all X and n, so the posteriors can't be consistent. On the other hand, if 
7r(t/) > for every neighborhood U of ^o, then under some conditions as in Section 3.3, it 
will be shown that the posteriors are consistent. It can happen in pathological cases that 
the posteriors are not consistent, for example if as the neighborhoods U shrink to {^0}; 
n{U) — > very fast, and if the likelihood function doesn't behave well. Such an example 
will be given in Proposition 4.1.2 and after it. 

4.1.1 Theorem. Assume that: 

(i) {Pff, 6* G 0} is a measurable family, dominated by a cr-finite measure v, and identifiable, 
so that Pe ^ P^ for 9 ^ (f); 

(ii) is a locally compact separable metric space, with Borel u-algebra T, 

(iii) {dPo/dv){x) = f{9,x) where /(•,•) is jointly measurable, 

(iv) P = Pq^ for some 9q G 0, and Xi, X2, . . . , are i.i.d. (P); 

(v) h{-,x) := log f{9o,x) — log/(-,a;) is continuous on 0, 

(vi) For some positive, continuous function b{-) on and integrable function it(-) on X for 
Pg;,, \h{9,x)\ < b{9)u{x) for all 9 and almost all x, 

(vii) (3.3.6) and (3.3.7) hold for the given h and 6(-), that is, limgi^oo b{9) > 7(6*0) = and 
E[]imm{0^^h{9,x)/b{9)] > 1. 

Then for any prior tt such that 7r{U) > for every neighborhood U of ^o, the posteriors 
are consistent. 

Notes. In (v), h{-, ■) has been chosen to incorporate an adjustment function so that, in 
the notation of Section 3.3, a{x) = 0. Here continuity of h in 9 is assumed in (v), rather 
than the lower semicontinuity assumed in Section 3.3, (A-2). This is needed in order to 
allow general priors. Suppose that f{9o,x) were the lim sup, not the limit, of f{9,x) as 
9 ^ 9q and that for some e > and sequence 9k 9o, f{9k,x) < f{9o,x) — e for x in 
a set A with P{A) > and all k. Then if the prior tt is concentrated on points 9 where 
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f{9,x) < f{9o,x) — £ for a; e ^, we can have 7r{U) > for every neighborhood U of ^o, 
but the posteriors might not be consistent. 

Proof. Continuity of h{-,x) imphes assumptions (A-1) and (A-2) of Section 3.3, where S 
is any countable dense set in G and A = 0. Assumption (vi) imphes that Eg^^ \h{9, x)| < oo 
for all 9, which is stronger than (A-3). In Theorem 3.3.16, using identifiability, (A-4) is 
shown to hold in this case. Condition (vi) is stronger than (3.3.5), and the other parts of 
(A-5) are assumed, so all of (A-1) through (A-5) hold. Lemma 3.3.9 doesn't involve any 
estimators T^, so it still holds. Also, now that continuity of h{-,x) and (vi) are assumed, 
the Lemma also holds with sup instead of inf, so 7(-) is continuous. Note that 7(^0) = 
in the present case. 

For any neighborhood U of 9o, there is an £ > such that almost surely for n large 
enough, 

Il^^Ji9,Xi)/f{9o,X,) < exp(-ne) 

for all 9 ^ U: this follows from the proof of (3.3.14) for 9 in some compact set C and from 
(3.3.12) for 9 ^ C. These proofs do not involve T^. On the other hand, for a small enough 
neighborhood V G U, hj Lemma 3.3.9, almost surely for n large enough, by the strong 
law of large numbers, for each 9 & V, 

Uf^Ji9,Xi)/f{9o,Xi) > eM-ne/2). 

Then for each (f) ^ U and 9 eV, the likelihood ratio for n observations satisfies 

R0,^{X^,... ,X,) > e"^/^ 

Since 7t{V) > 0, the ratio of posteriors 7rx,niV)/7rx,niQ \ U) ^ 00, which implies that 

The following will give examples where posteriors are not consistent: 

4.1.2 Proposition. Suppose {Pq, ^ e 0} is a family with densities f{9,x) such that for 
a metric d on Q, some 9o & Q, P = Pg^, and a sequence 9m converging to ^0 for d, we have 

< am :^ I{P0„PoJ ^ -Elog{f{9m,-)/f{9o,-)) 1 C 

strictly as m ^ 00 where < C < +00. Then there is a prior tt on the sequence {^rn} 
with Tv{9m) > for all m > 1, and so with tt{U) > for every neighborhood U of 9q, such 
that for Xi,X2, ■ ■ ■ i.i.d. P, the posteriors are not consistent. 

Proof. Let Ym{x) := log{f{9m,x)/f{9o,x)). Since = —EYm > by Lemma 
3.3.15 and cim < C < +00, it follows that Ym > —00 a.s., so f{9m, ■) > a.s. Let 
Ymn '■= 12j=i Ymi.Xj)/'^- Then Ymn ~^ ~<^m cL-S- as n — > oo by the strong law of large 
numbers, so for each m = 1, 2, . . . , there is an no := no{m) such that 

Pr{for some n > Uq : Ymn < -Om - (ftm+i - am)/3 or 

(4.1.3) 

Ym+l,n > -Om+l + (Om+l " am)/3} < 1/2"^. 
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Also choose no{m) large enough so that exp {—no{m){am+i — a-m)/^) < 1/2. Let Lnm '■= 
Ii-j'^if{dmiXj). Since f{9m,Xj) > a.s. for each j, there is always an > such that 

Pr(Lnm < Onm) < l/(2"^no(m)). 

Also, there are always bnm < oo large enough so that 

PriLnm>bnm) < l/(2^no(m)). 

Let Tm '■= 2 + 2 ■ m.ax{bn.m+i / 0-nm '■ n < no(m)}. Then the prior tt will be defined so 
that 7r(6'm+i)/7r(6'TO,) = l/?'m- In other words, for the suitable normalizing constant c, let 
7r(6'i) := c and n{6m) '■= c/IV-lI^rj for m > 2. Then for each n < no(m), the posterior 
probability 7ra;,n(6'm+i) can be larger than 7ra;,n(6'm)/2 only if Ln,m+i/Lnm > rmf^, which 
requires either L^^m+i > bn,m+i or L^m < cinm- For a given m, the probability that any 
of these events occur for n < no{m) is at most 2/2"^. Also, except on the event in (4.1.3), 
we have for n > no{m) that l^m+i,n < and so Ln^m+i/ Lnm < 1- So, for each m, 

except on an event of probability at most 3/2"^, we have 7rx,n{(^m+i) ^ '^x,n{dm)/^ for 
aU n. Since X]m^/2"^ < oo, by the Borel-CanteUi lemma (RAP, Theorem 8.3.4), almost 
surely there is an mo such that for all m > mo and all n, 7rx,ni9m+i) < 7i'a;,n(^m)/2. Then 
X^m>mo ^a;,n(^m) < '^x,nidmo)j SO the Mt sidc of the latter inequality can't converge to 
1, and the posteriors can't be consistent. □ 

For a specific example where the last proposition applies, let the sample space be the 
open interval < a; < 1 with v — Lebesgue measure. For m > 2 let fm be continuous, 
fm{x) := e""^ for < a; < 1/m, let fm be linear on the interval 1/m < x < l/m + e~'^ 
and let fm be constant for 1/m + e""^ < x < 1. The constant is 1 + 1/m + o(l/m). (A 
simpler example could be defined, constant on (0, 1/m) and on [1/m, 1).) Let 6m '■= 1/m 
and/(6'^,x) := fm{x). Let 6*0 := Oand/(0,x) = 1, giving the uniform distribution. 
Then /^(x) — > 1 as m — > cxo for < a; < 1, so /(■, x) is continuous in 9 on the sequence 
where it is defined. We have E{\ogfm) = — 1 + 1/m + o(l/m) as m — > oo and, taking a 
subsequence, we can assume that the convergence of these integrals is strictly monotone. 
Then Proposition 4.1.2 applies with C = 1. 

In another example, if Pe is uniform on [9, 1], where < ^ < 1, we will have consistency 
if the true 6*0 = 0, for any prior tt with tt{U) > for every neighborhood U of 0, even 
though am = +00 for all m. 

The non-consistency at one point 9o in Proposition 4.1.2 and the example after it result 
from (a) peculiar behavior of the likelihood function as ^ ^ ^o, so that although f{-,x) 
is continuous, Pg^ moves further away from Pg^^ in terms of Kullback-Leibler divergence 
as m — cxD, and (b) very fast decrease of the prior probabilities of neighborhoods of 6*0 as 
they shrink to 9o. It may not be surprising, then, that such behavior is exceptional and 
can only happen on a set of prior probability 0, under quite general conditions, as follows. 

Suppose given a parameter space with a cr-algebra B of subsets and a prior prob- 
ability distribution tt on B. Let {X,A) be a sample space and let be the set of all 
sequences {Xn}'^^i with G X for all n. On X°^ we have the product cr-algebra, the 
smallest cr-algebra making each X^ measurable. Suppose that for each ^ e 0, a probability 
measure Ptq is given on X°° and that the family Prg, ^ e 0, is measurable. We get by 
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Proposition 1.3.5 with X°° in place of X a joint distribution Pr on x X°° where 9 has 
marginal distribution tt and, for each ^ G O, {^n}n>i have conditional distribution Pr^. 

Posterior probabilities can be defined for families that are not necessarily dominated by 
a (T-finite measure, as follows. Let Pr be a probability on the product cr-algebra in O x . 
Let := (Xi, Xn). A function {A, X^")) ^ Pr(A|X(")) from B x into [0, 1] is a 

regular conditional probability for Pr of A given X^'^) if, for each A E B, Pr(A, •) equals the 
conditional probability of A given X("), and for each X^'^) G X'^, Pr(-,X('^)) is a countably 
additive probability measure on B. Then a general definition of the posterior probability 
7:x,n on © is that it is a regular conditional probability Pr(-,X*^'^)) if one exists, for Pr 
defined as above via Proposition 1.3.5. For dominated families {Pq, 9 E Q}, posteriors as 
defined via Ukelihood functions were shown in Theorem 1.3.7 to exist almost surely, and 
the two definitions of posteriors will agree. 

Let's say that the family Pie, 6* G O, is empirically identifiable if for some measurable 
function T from X°° into ©, for each 9 E &, T{x) — 9 for Pr^-almost all x G X°°. This 
occurs, for example, if there are estimators T„(Xi,... ,X^) converging to 9 Pr^-a.s. as 
n — > oo for each 9. 

4.1.4 Theorem (Doob). Let Pr^i, ^ G 0, be a measurable, empirically identifiable family. 
Suppose that is a Borel subset of a complete separable metric space with Borel a-algebra. 
Let TT be a prior probability on with ti{U) > for every non-empty open set U. Then 
the posteriors TTx,n exist almost surely and are consistent for yr-almost all 9. 

Proof. There is a Borel isomorphism of onto a complete separable metric space (RAP, 
Theorem 13.1.1). Then the posteriors exist in the sense of regular conditional probabilities 
of 9 given Xi, . . . , X„ by RAP, Theorem 10.2.2. 

Let U he a non-empty open subset of (for the original topology, not another metric 
obtained via Borel isomorphism). Then Iq^u is an integrable function. Its conditional 
expectation 

EiU^ulXi,... ,Xn) = PTi9eU\Xi,... ,Xn) = 7Tx,niU)- 

Let J-'n be the smallest cr-algebra with respect to which Xi, . . . ,X^ are measurable. The 
conditional expectations of a fixed integrable function with respect to an increasing se- 
quence of cr-algebras J^n form a martingale (RAP, Sec. 10.3), which converges almost 
surely, in this case to leeu (RAP, Theorem 10.5.1), since by empirical identifiability, this 
function is measurable with respect to the cr-algebra generated by the union of the J-'n- 

Since the topology of has a countable base (RAP, Proposition 2.1.4), let {C/fc}^^ 
be such a base. Let the convergence of the martingale for U = Uk hold for Prg-almost all 
X for aU 9 ^Ak where 7r{Ak) = 0. Let A := (J^i Ak. Then n{A) = 0. Let 9 ^ A. Then 
9 has a neighborhood-base consisting of a subsequence {Uk{j)}j>i- By convergence of the 
martingales we have a.s. Pr^, TVx,n{Uk(j)) — > 1 as n ^ oo for all j, so the posteriors are 
consistent at 9. This completes the proof. □ 

Thus, if a prior n has an atom with 7r((^) > for some singleton {(j)}, G 0, then the 
posteriors will be consistent for such a 4> under very general conditions. The above proof 
can be applied without the assumption that is a separable metric space, and even if 
is an isolated point, because the posterior probability of {0} will converge to 1 a.s. Pr<^. 
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NOTES 



At this writing I do not have a reference for Theorem 4.1.1 but it is presumably- 
known. Schwartz (1965) gave sufficient conditions for consistency of posteriors at particular 
^o's. Freedman (1963) and Schwartz (1965) gave examples of non-consistent posteriors. 
Proposition 4.1.2 and the example after it are related to their examples. Theorem 4.1.4 is 
attributed to Doob (1949). I learned it from Le Cam (1986), p. 616, Prop. 2 and Corollary. 
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